A general model-based causal inference method overcomes the curse of synchrony and indirect effect

To identify causation, model-free inference methods, such as Granger Causality, have been widely used due to their flexibility. However, they have difficulty distinguishing synchrony and indirect effects from direct causation, leading to false predictions. To overcome this, model-based inference methods that test the reproducibility of data with a specific mechanistic model to infer causality were developed. However, they can only be applied to systems described by a specific model, greatly limiting their applicability. Here, we address this limitation by deriving an easily testable condition for a general monotonic ODE model to reproduce time-series data. We built a user-friendly computational package, General ODE-Based Inference (GOBI), which is applicable to nearly any monotonic system with positive and negative regulations described by ODE. GOBI successfully inferred positive and negative regulations in various networks at both the molecular and population levels, unlike existing model-free methods. Thus, this accurate and broadly applicable inference method is a powerful tool for understanding complex dynamical systems.

4. The authors mentioned the possible extension of the current work to the cases containing time delays. However, if the time delays are introduced into the systems, it seems that the current algorithm will not be effective because time delays are unknown, are more complicated, it is hard to locate the detection regions as defined in the current work. I suggest to include a case containing the time delays to show whether the current work is effective or not. Then a much clearer description for the future direction should be provided; otherwise, the impossible extension will also limit the usefulness of this work. 5. Actually, there are a series of methods for parameters estimation for ODE or even PDE models in control theory. Once one has the data, the methods could be directly used to estimate all the parameters and reconstruct the system. Frankly, it sometimes requires high computational cost, but it could be tolerated for the models that are used in the current work. More illustrative examples of high dimensions should be provided to show the possible outstanding advantages of the current work to those model-based methods.
6. Are the monotonic models as investigated in the current work sufficient representative? Are the non-or partially-monotonic models are more prevalent? This justifications also should be further provided. Is there any chance to extend the current work to those nonmontonic or even temporalstructured models?
Reviewer #2 (Remarks to the Author): The authors attempted to devise a general ODE framework to infer causal relationships between variables from time-series data, with the aim to circumvent limitations of model-free methods on false discoveries and to expedite computation over existing model-based methods that fit various families of specific functions to data so that the underlying mechanistic relationship can be learned. Pivoting on the assumption of monotonic pairwise relationship, the method developed here (GOBI) was shown to detect directed regulation and rule out indirect regulation. The authors tested the efficacy of their method on simulated datasets as well as experimental datasets, claiming that GOBI is accurate and broadly applicable.
1. There is a clash between the authors' claim of a general inference method and the details provided in the text, which focuses mainly on computational biology and regulatory networks. For example, in the introduction, model-based inference is presented only focusing on algorithms designed for regulatory networks, and popular and powerful methods such as Kalman Filter (that requires only a model, not restricting it to any class, and can handle both measurement and dynamical noise) are not even mentioned. Interaction inference is a broad field. Methodology and literature are vast and stretch well beyond the papers highlighted by the authors, which are very specific to regulatory networks. The authors should either broaden their view throughout the paper, including applications, or tune it to the designed audience and preferred realm of application (starting from the title).
2. The part of the Discussion section regarding the method limitations is very slim. There are limitations and assumptions here and there (such as thresholds when applying to noisy series) on which the authors should spend few words in the Discussion.
3. The paper has several statements that are just too generous towards the method, such as "our approach completely resolves the fundamental limit of model-based inference: strong dependence on a chosen model." The presented method can only be applied to a specific class of systems, so it sits somewhere between model-based inference and model-free inference. Furthermore, in the noisy case, it depends on two thresholds. Tuning down the text, eliminating these kinds of statements, and highlighting the limitation of the methodology does not take anything out of its value.
4. "In most biological systems, the degradation rates of molecules increase as their own concentrations increase; thus we assume that self-regulation is negative for every component in the system". This is another assumption that was not mentioned in presenting the method. If the method is good, this assumption is not needed. On the contrary, it would be an output of the inference. The authors should show that. 5. What happens in systems where the assumptions are not met? Is all inference messed up? Or only the links violating the assumptions are not trustable?
6. If I well understand, the thresholds for the network inference are derived a posteriori to optimize the inference performance. If so, this should be discussed as a limitation, at least in the Discussion. Which is the sensitivity of the results to the threshold values? And how should the user proceed to choose the thresholds if ground-truth data is not available? Some additional minor points: Sec II A: "positive or negative causation" maybe the authors meant "regulation" rather than "causation" &+) %% $# ',+ (54,123 3,15.* 34(4+ ).+(2.8 ,16 9 -3 +74+0*+* 41 4,2++ (0* /12+ *-/+03-103" ',+ reader can get the idea but being explicit is always better.
Sec IIC: The noise in the time series is measurement noise, not dynamical, isn't it? It should be stated clearly in the text. Also, it would be interesting to see, at least for a simple negative feedback loop of two nodes, what would be the impact of dynamical noise.
How much data is needed to run the inference? How does it depend on the order of the interaction? Is it suitable for large networks? I understand that the space to compute S goes down exponentially but I think the paper needs more quantitative statements Since the authors present an implementation, it would be nice to know the order of magnitude of the runtime In the supplementary material, the description of the noise level calculation could be improved. Just saying that the library uses a MATLAB functon(of which I couldn't find the documentation) without mentioning what the function does is a bit short, being that an ingredient of the calculation. Same for the MATLAB function "gradient". Is it applied before or after the noise filtering? I guess after, but it should be specified. And what is the underlying algorithm?
In the calculation of GC, I couldn't find how you chose the order of the AR processes for Y and X.
Speaking of GC, if you shift one series and test GC X(t) Â X(t+T) (Fig.4ab) and GC does not detect that as a link, I would be surprised, as the shifted timeseries becomes an AR process of its past, which is exactly what GC tests for. Maybe the authors could discuss more in depth why they chose to employ this test. Alternatively, I think that if they want a null hypothesis for non-connection, it could have more sense to use a series from another dataset (such as TetR & P & D).
1. Does the current algorithm still need the explicit forms of the ODE vector fields? In real applications, the explicit forms are always incomplete or even unclear. How to deal with this kind of cases? Can some functional bases be used to approximate the vector fields? This is a really important issue. An algorithm, which cannot deal with this issue, will be not that practical in applications.
Our algorithm, called General ODE-Based Inference (GOBI), does not require explicit knowledge of the ODE vector fields. Specifically, GOBI aims to infer the regulation of I by multiple components H, described by the general ODE model (Eq. (1)), where N represents any smooth function with increasing or decreasing behavior with respect to H . Unlike the majority of previous model-based inference methods, our algorithm (GOBI) does not require a specific form of N, making it highly practical for real applications. Thus, all inferences were performed without making assumptions about the explicit form of the ODE . This is an important aspect of our approach, which we have discussed in detail in the manuscript as follows: • Section I (page 1): "Here, we develop a model-based method that infers interactions among multiple components described by the general monotonic ODE model: }y } ± N(H) ± NºH g " H h " 4 " H u ), where N can be any smooth and monotonic increasing or decreasing functions of H and H u is I in the presence of self-regulation. Thus, our approach considerably resolves the fundamental limit of model-based inference: strong dependence on a chosen model." 2 2 • Section III (page 8): "… Importantly, as our method can be applied to any system described by general monotonic ODE (Eq. (1)), it significantly addresses the fundamental limit of the modelbased approach (i.e., requirement of a priori model accurately describing the system). …" 2. In the manuscript, noise has been taken into account to test the robustness of the developed algorithm. However, it is not clear what is the form of this multiplicative noise and which parameters are incorporated with this kind of noise and why the parameters are selected. Actually, there are many forms of noises, including the additive, the multiplicative noise and colored noise. What about the robustness of the current work to all these noises?
We thank the reviewer for this comment. In our study, we generated the noisy time-series data by introducing multiplicative noise to the time series simulated from ODE, which we discussed in the Methods section as follows: Section IV B (page 11): "… To introduce measurement noise in time series, we introduce multiplicative noise sampled randomly from a normal distribution with mean 0 and standard deviation given by the noise level. For example, for 10% multiplicative noise, we add the noise Furthermore, we have now tested the robustness of GOBI to various types of noises, including additive noise, colored noise, and dynamical noise, and discussed it as follows: Section II C (page 6): "… GOBI successfully infers regulatory networks from simulated time series using ODE models  in the presence of multiplicative noise (Fig. 3k) and other types of noise ( Supplementary Fig. 7a). Here, the F2 score, the weighted harmonic mean of precision and recall, is nearly one, indicating that GOBI is able to recover all regulations almost perfectly. However, it should be noted that noise types that significantly affect the shapes of trajectories can result in the decreased performance of GOBI that uses time-series shape information for inference (Supplementary Fig. 7b)." • Supplementary Information: Section X. The robustness of GOBI to various types of noise In the main text, we considered multiplicative noise as the measurement noise (Fig. 3). Here, we investigate the robustness of GOBI by introducing different noise models, such as additive noise, colored noise, and dynamical noise.
} denote the original time series that are normalized after being simulated from the ODE. To generate the noisy time series with the multiplicative noise level of U,, we randomly sample a noise c ± {c g " c h " 4 " c u } from a normal distribution with a mean of 0 and a standard deviation of 1 (i.e., c & A(0, 1), where Q ± (" )" 4 " A). Then, we multiply the noise c proportional to the noise level and add it to the original time series to obtain the noisy time series as follows: Next, we generate noisy time series with additive noise. Following a procedure used for generating time series with multiplicative noise, we randomly sample the noise c from a normal distribution A(0, 1). Then, we add the noise to the original time series proportional to the noise level as follows: Until now, we have only taken into account the effects of measurement noise. To make GOBI more widely applicable, we also need to consider the influence of dynamical noise. To generate noisy time series with dynamical noise, we add a noise term to the ODE. Specifically, let the dynamic of I be given as follows: Then, the noisy time series I }z|z ~ with the dynamical noise level of m% is simulated from the following ODE: where I ~z is the mean of the original time series I, and c is randomly sampled from a normal distribution A(0, 1).  (Supplementary Fig. 7a (iv) and (v)). Specifically, the performance of GOBI is greatly affected by additive noise and pink noise compared to other types of noise. Additive noise is added independently of the original signal's value, whereas multiplicative noise is added as a proportion of the signal. Thus, additive noise has a greater impact on the signal's overall shape and magnitude than multiplicative noise (Supplementary Fig. 7b (i) and (ii)). Additionally, pink noise has more power at low frequencies than blue noise, making it more likely to affect the shape of time series (Supplementary Fig. 7b (iii) and (iv) 20% level of noise is added (grey line) for each type of noise: multiplicative noise (b (i)), additive noise (b (ii)), pink noise (b (iii)), blue noise (b (iv)), and dynamical noise (b (v)). Then, the Fourier method is applied to fit the noisy time series (blue line).
3. It seems that this algorithm's accuracy crucially depends on the sampling rate for the collected experimental data. It is not clear the ceiling/down sampling rate for the accuracy in applications. More justifications need to be done. These justifications are also related to the computational cost of this algorithm.
We agree with the reviewer's comment that the accuracy of GOBI depends on the sampling rate of the collected experimental data, because a low sampling rate can result in time series with a different shape compared to data collected at a high sampling rate. To address this concern, we have now tested the accuracy of GOBI on experimental data by adjusting the sampling rates and found that it is less sensitive compared to the model-free methods as described below ( Supplementary Fig. 10). We have also discussed this issue in the Discussion section (see Reviewer 3's comment 2). Lastly, we have now included a discussion of the computational cost of GOBI in the Supplementary Information (see Reviewer 3's comment 11).
• Supplementary Information: Section XIII. The accuracy of GOBI on experimental data at different sampling rates Here, we investigate how the accuracy of GOBI and model-free methods (GC, CCM, and PCM) is affected by the sampling rates of experimental time-series data obtained from systems with known regulatory networks . After varying the sampling rates of each time-series data, we apply GOBI and model-free methods, including GC, CCM, and PCM, and compute the F2 scores of the inference results ( Supplementary Fig. 10b). Model-free methods infer a lot of false positive predictions regardless of the sampling rate because they often misidentify synchrony for causality (Supplementary Fig. 10c). While PCM can infer the true network structure (i.e., two independent feedback loops) of the prey-predator system merged with the genetic oscillator from the original experimental data, its accuracy drops dramatically when the sampling rate is reduced (Supplementary Fig. 10b (i)). However, GOBI successfully infers true network structures even when the sampling rate is halved . When the sampling rate is reduced by a quarter, which significantly changes the shape of the time series ( Supplementary Fig. 10a (i)-(ii) bottom), the accuracy of GOBI also decreases, but it is still comparable to that of model-free methods . This indicates that GOBI is robust to changes in the sampling rate as long as the shape of the time series is preserved. 4. The authors mentioned the possible extension of the current work to the cases containing time delays. However, if the time delays are introduced into the systems, it seems that the current algorithm will not be effective because time delays are unknown, are more complicated, it is hard to locate the detection regions as defined in the current work. I suggest to include a case containing the time delays to show whether the current work is effective or not. Then a much clearer description for the future direction should be provided; otherwise, the impossible extension will also limit the usefulness of this work.
We have now extended GOBI to the case containing time delays. In particular, we have investigated how GOBI can be used to infer time-delayed causal interactions using a simple example taken from [Glass, D.S. et al., Nat Commun, 2021] as described below ( Supplementary  Fig. 12). We have also discussed this in the Discussion section (see Reviewer 3's comment 2).
• Supplementary In the presence of positive regulation from H to I with time delay e x , < x y ([" [ , " e x ) is always positive where H } ([" [ , " e x ) > 0, and its normalized integral (i.e., extended regulation-detection score with a time delay), is one.
This idea can be applied to the case of delayed negative regulation. In the presence of negative regulation H 3 I with time delay e x , the extended regulation-detection function with a time delay, can capture the positive relationship between ¯Hº[¯e x ) and I 5 ([). Thus, < x y ([" [ , " e x ) is always positive where ¯H } ([" [ , " e x ) > 0, and its normalized integral, is one. Now, we illustrate how to infer time-delayed causal interactions using a simple example, taken from (Glass, D.S. et al., Nat Commun, 2021), where H negatively regulates I with time delay e x = 3 ( Supplementary Fig. 12a). To infer the delayed regulation, we use the extended regulation-detection score with a time delay (Eq. (S2) and (S3)). We compute F x y ºe» for each time-series data with different values of e, varying from 0 to 5 ( Supplementary Fig. 12b). F x y (e) is always one only when the e is equal to e x , whereas if e is not equal to e x , F x y (e) is not always one. Thus, GOBI can be used to infer delayed regulation and detect the corresponding time delay. This idea can be extended to cases with multi-dimensional regulation, which would be an interesting future direction. 5. Actually, there are a series of methods for parameters estimation for ODE or even PDE models in control theory. Once one has the data, the methods could be directly used to estimate all the parameters and reconstruct the system. Frankly, it sometimes requires high computational cost, but it could be tolerated for the models that are used in the current work.
More illustrative examples of high dimensions should be provided to show the possible outstanding advantages of the current work to those model-based methods.
We agree with the reviewer that the computational cost of the model-based methods can be tolerable with the advance in the algorithms and GPU/CPU. However, the major drawback of the model-based methods is that they heavily rely on the selection of a model because choosing an unsuitable model can result in false predictions. We have discussed this in the manuscript as follows: • Section I: "… Although testing the reproducibility is computationally expensive, as long as the underlying model is accurate, the model-based inference method is accurate even in the presence of synchrony in time series and indirect effect [21][22][23][24][25][26][27][28][29]. However, the inference results strongly depend on the choice of model, and inaccurate model imposition can result in false positive predictions, limiting their applicability. …" We have now demonstrated how GOBI can offer advantages over the model-based approach by providing a simple example as shown below ( Supplementary Fig. 4). We have also discussed this issue in the Results section and Discussion section (see Reviewer 3's comment 2). 9 9 • Section III: "… Taken together, our method successfully infers regulatory networks from various in silico systems regardless of their explicit forms of ODE by assuming a general monotonic ODE (Eq. (1)). Unlike our approach, model-based methods that require specifying the model equations produce inaccurate inferences if inappropriate functional bases are chosen ( Supplementary Fig. 4)." • Supplementary Fig. 4a (i)-(ii)). Here, 5 positively regulates 6 since g i 5 i¯5h ® 5 ® ( is an increasing function with respect to 5. With the ODE describing the network, we simulate a time-series data with initial condition of 6(0) = 1 and input signal 5 ( Supplementary Fig. 4a (iii)). The time series of input signal 5 is constructed by connecting 51 randomly selected points from A(1, 0.5) over the time domain [0,50] with the spline fitting.
Suppose we do not know the underlying dynamics of this system and use the following hill-type functions to test for the existence of regulation from 5 to 6 (Supplementary Fig. 4b (i)), which has been widely used in the previous model-based inference (Gotoh, T. et al., Proceedings of the National Academy of Sciences, 2016;Lillacci, G. & Khammash, M., PLoS computational biology, 2010;Toni, T. et al., Journal of the Royal Society Interface, 2009): accurate predictions of causal relationships using model-based methods, which can limit the effectiveness of these approaches.
We then apply GOBI, which does not need to specify underlying regulatory functions, to the same data. First, we employ the moving-window technique to divide the time series into 41 different segments with a length of 10, where each segment overlaps with the previous one by 10%. From each time-series data, we compute the regulation-detection scores for all possible 1D and 2D regulations (Supplementary Fig. 4c  Supplementary Fig. 4. Comparing model-based method and GOBI for inferring a regulatory network. a The ODE (a (i)) describes a regulatory network (a (ii)) with two components. Here, 5 positively regulates 6 (i.e., g i 11 method is tested with assuming two different hill functions (`n § t kn § and a t t kn ¦ ) to describe the positive and negative regulation from 5 to 6 (b (i)). We use simulated annealing to estimate seven parameters `" a" b" > g " > h " V" and U. Here, parameters are selected within a range of 20], and V" U / [1,5]. For 100 pairs of parameters that reproduce the time-series data (b (ii)), the values of ` and a are similar and positive (b (ii)) indicating that the regulation from 5 to 6 is a mixture. Thus, the model-based method fails to inter the network structure. c We then apply GOBI to infer the network structure. To apply GOBI, we first use the moving-window technique with a window size of 10 and an overlapping ratio of 10% to divide the time-series data of 5 and 6 into 41 different segments. Then, from each time-series data,

and one 2D regulation (c (ii) right). This 2D regulation passes the _ test and GOBI successfully infers the network structure.
6. Are the monotonic models as investigated in the current work sufficient representative? Are the non-or partially-monotonic models are more prevalent? These justifications also should be further provided. Is there any chance to extend the current work to those non-monotonic or even temporal-structured models? Determining whether monotonic systems are more prevalent than non-or partially-monotonic systems is challenging. However, it is evident that monotonic regulation plays a crucial role at various scales. First, at the molecular level, gene regulation is often monotonic. For example, the binding (unbinding) of specific proteins, known as activators (repressors), to DNA, is necessary for transcription to begin (stop), indicating positive (negative) causal interaction. Additionally, there are numerous monotonic relationships among molecules, such as hormones (Waadt, R. et al., Nat Rev Mol Cell Biol, 2022;Mazziotti, G., Giustina, A., Nat Rev Endocrinol, 2013) and metabolites (Baker, S.A., Rutter, J., Nat Rev Mol Cell Biol, 2023). Next, in epidemiology, certain variables can positively (or negatively) influence related diseases. For example, standing water serves as a breeding ground for mosquitoes that transmit the dengue virus, so rainfall positively affects the incidence of dengue fever. Also, high temperatures expedite mosquito development, leading to a positive relationship between temperature and the outbreak of dengue fever (Alto BW, Bettinardi D., The American Society of Tropical Medicine and Hygiene, 2013; Lai, YH., BioMed Eng OnLine, 2018). Lastly, monotonic relationships are 12 observed in the climate system, such as a strong positive feedback effect between temperature and greenhouse-gas concentrations (Egbert H. et al., Nat. Clim. Change, 2015).
Nevertheless, we agree with the reviewer that non-or partially-monotonic systems also play important roles in the real world. Thus, we have now illustrated how GOBI can be potentially used to detect non-monotonic regulation. Specifically, we have described how GOBI can identify the types of regulation in a temporal-structured system, where the regulation type is changed from positive regulation to non-monotonic regulation, negative regulation and the absence of regulation, as described below ( Supplementary Fig. 11). We have also discussed this issue in the Discussion section (see Reviewer 3's comment 2).

• Supplementary Information: Section XIV. Extension to temporal-structured model including non-monotonic regulation
Here, we illustrate an example of how GOBI can be applied to a temporal-structured system that includes non-monotonic regulation. To construct a temporal-structured regulation from 5 to 7 (Supplementary Fig. 11a (i)), the values of parameters (`, a, and b) are varied over time (Supplementary Fig. 11a (ii)), resulting in different types of regulation from 5 to 7: positive regulation in the first quarter, non-monotonic regulation in the second quarter, negative regulation in the third quarter, and the absence of regulation in the last quarter. We simulate 100 time-series data of 7 from different input signals 5 and 6 ( Supplementary Fig. 11b). The time series of input signals are constructed by randomly selecting 200 points from [0, 1] over the time domain and connecting them with the 'spline' method.
Next, we use a moving-window technique with a window size of 5 and an overlapping ratio of 50% to segment the time-series data of 5, 6, and 7. For each window, regulation-detection scores for positive and negative regulation (F n p p and F n p p ) are computed from the 100-time series (Supplementary Fig. 11c). The criterion F n p p = 1 (F n p p = 1) is satisfied in the presence of positive (negative) regulation (Supplementary Fig. 11c, the first and third quarter). However, in the presence of non-monotonic regulation or the absence of regulation, neither criterion is satisfied, indicating that the regulation-detection score cannot distinguish between these two cases ( Supplementary Fig. 11c, the second and last quarter). In particular, when the strength of both positive and negative regulation is similar, the regulation-detection scores are around zero (the middle of the second quarter in Supplementary Fig. 11c), which is similar to the case of the absence of regulation.
On the other hand, the regulation-detection functions, instead of their normalized integrals (i.e., regulation-detection scores) differ between non-monotonic regulation and the absence of regulation. To distinguish between these cases, we utilize the regulation-detection functions < n p p and < n p p , which correspond to positive and negative regulation from 5 to 7, respectively. These functions, < n p p and < n p p , are defined on the regulation-detection regions E n p (bottomright triangle, Supplementary Fig. 11d) and E n p (top-left triangle, Supplementary Fig. 11d), 13 13 respectively. We compute < n p p and < n p p for each time series window using all available data. As expected, the presence of positive and negative regulations resulted in positive values for < n p p and < n p p on whole their domains, respectively (Supplementary Fig. 11e (i) and (iii)). On the other hand, in the presence of non-monotonic regulation or the absence of regulation, the regulation-detection functions have both positive and negative values (Supplementary Fig. 11e (ii) and (iv)). However, the patterns of the mixture of positive and negative values are completely different between non-monotonic regulation (Supplementary Fig. 11e (ii)) and the absence of regulation (Supplementary Fig. 11e (iv)). Specifically, in the absence of regulation, the sign of the regulation-detection function varies inconsistently across the time-series data, resulting in a completely mixed sign of the regulation-detection function (Supplementary Fig. 11e (iv)). In contrast, in the presence of non-monotonic regulation, the sign of the regulation-detection function is consistent across the time-series data (Supplementary Fig. 11e (ii)). Taken together, We quantify this consistency to distinguish between non-monotonic regulation and the absence of regulation. We first divide the regulation-detection regions (E n p and E n p ) into triangles of equal shape ( Supplementary Fig. 11f left). Then, we count the number of positive values of < n p p and < n p p at each triangle, and normalized this count by the total number of positive values to approximate the probability distribution of positive values (Supplementary Fig. 11f right,P). Similarly, we approximate the probability distribution of negative values (Supplementary Fig. 11f right,N). Here, we use the results of regulation-detection functions on the window in the presence of non-monotonic regulation as an example (Supplementary Fig. 11e (ii)). Finally, we quantify the similarity between the probability distribution of positive and negative values (P and N) using the Jensen-Shannon Divergence (JSD) : A high (low) value of JSD indicates that the regulation-detection function has consistent (inconsistent) sign throughout the region. The similarity, measured by JSD, is much higher when non-monotonic regulation is present compared to the absence of regulation (Supplementary Fig. 11g). This suggests that the consistency test based on JSD can be used to distinguish between non-monotonic regulation and the absence of regulation.
We next investigate whether the proposed consistency test can distinguish between nonmonotonic regulation and the absence of regulation when experimental data is given. Specifically, among the eight different time-series data of the genetic oscillator (Fig. 4b), we use four time series with a similar range of d hj (Supplementary Fig. 11h Fig. 11i ). On the other hand, in the absence of regulation, the sign of the regulation-detection function varies across the data ( Supplementary  Fig. 11i ). Consequently, JSD is much higher for non-monotonic regulation than for the absence of regulation ( Supplementary Fig. 11j and ), indicating that the consistency test can distinguish between non-monotonic regulation and the absence of regulation.
Supplementary Fig. 11. Extended framework of GOBI to distinguish non-monotonic regulation from the absence of regulation. a As described by the ODE (a (i)), the time-varying values of parameters (`, a, and b) (a (ii)) result in different types of regulation from 5 to 7: positive regulation in the first quarter, non-monotonic regulation in the second quarter, negative regulation in the third quarter, and the absence of regulation in the last quarter. b With the ODE describing the network (a (i)), 100-time series are simulated from different initial conditions and one of them is shown. Then, the Jensen-Shannon Divergence (JSD) is used to measure the similarity between two probability distributions (f bottom-right). Here, 32 number of partitions are used. Also, to illustrate the P and N, the results of regulation-detection functions on the window in the presence of non-monotonic regulation are used (e (ii)). g The JSD is higher in the presence of non-monotonic regulation than in the absence of regulation. h-j This strategy is applied to the time-series data of the genetic oscillator (h) by comparing the regulation-detection function in the presence of positive regulation (i ), non-monotonic regulation (i ), and the absence of regulation (i ). JSD is higher in the presence of non-monotonic regulation than in the absence of regulation (j).

Reviewer: 2
The authors attempted to devise a general ODE framework to infer causal relationships between variables from time-series data, with the aim to circumvent limitations of model-free methods on false discoveries and to expedite computation over existing model-based methods that fit various families of specific functions to data so that the underlying mechanistic relationship can be learned. Pivoting on the assumption of monotonic pairwise relationship, the method developed here (GOBI) was shown to detect directed regulation and rule out indirect regulation. The authors tested the efficacy of their method on simulated datasets as well as experimental datasets, claiming that GOBI is accurate and broadly applicable.
Recent years has witnessed a number of endeavors for developing a general ODE framework to infer causal relationship from time-series data (PMID: 27698038, 34675223, 36476735). The theoretical development presented in this study is straightforward but refreshing. If done properly, it could very well be a generic tool for causal inference using time-series data. Unfortunately, the current manuscript has several fundamental issues, raising concerns about the validity and applicability of the proposed method. Below are some general comments.
We thank the reviewer the positive feedback about the potential of GOBI as a generic tool for causal inference. Furthermore, we appreciate the reviewer for constructive and detailed comments, which greatly improves the applicability and readability of the manuscript. We have also cited the suggested relevant works in the manuscript as follows: Section I: "… However, the inference results strongly depend on the choice of model, and inaccurate model imposition can result in false positive predictions, limiting their applicability. To overcome this limit, inference methods using flexible models were developed [30-36;Xie, X., Samaei, A., Guo, J. et al., Nat Commun, 2022;Chen, Z., Liu, Y. & Sun, H., Nat Commun, 2021;and Tegnér, J. et al., Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2016]. …" 1. The assumption of monotonicity greatly limits the applicability of GOBI. It is ubiquitous that biological variables interact to influence their dynamics, meaning that monotonic relationship (Equation 1) is extremely rare in real-world scenarios. Since this assumption serves as the foundation for causal inference in GOBI (see Supporting Information Section 1), violations could also confine the validity of the method. Consider a predator-prey model with an interaction term (Y'=X*Y -Y). Due to the non-monotonicity in X and Y' and the non-independence between X and Y, theorem 1 does not hold and the regulation detection score is bound below 1. This ensures false negative discoveries. Moreover, multivariable monotonicity has not been defined formally.
We acknowledge the reviewer's concerns about the assumption of monotonicity in GOBI, which could potentially limit its applicability. However, it is important to note that the primary goal of GOBI is not only to infer causality, but also to determine the type of causation, whether positive or negative. To achieve this, we have focused on monotonic systems. Moreover, monotonic relationships are common across various scales (see Reviewer 1's comment 6 for details).
Nevertheless, we have now extended GOBI to distinguish between non-monotonic regulation and the absence of regulation (see Reviewer 1's comment 6 for details), which can increase the applicability of our method in cases where the assumption of monotonicity is not valid. For example, in the case of the prey-predator model, the predator population (I) dynamics can be described as I 5 ± NºH" I» ± H -I¯I, where H represents the prey population. As H and I are both positive, N is a monotonic increasing function with respect to H and a non-monotonic function with respect to I, reflecting the growth and death of the population. Using our current algorithm, we can only identify the monotonic regulation from H to I. However, with our extended algorithm, we can also detect the non-monotonic regulation of I itself, which is crucial for capturing the complex dynamics of the system. Also, as the reviewer pointed out, multivariable monotonicity has not been defined formally. We unintentionally omitted a clear description of multivariable monotonicity. To resolve this, we have now revised the manuscript as follows: " Supplementary Information (Section I): "If there is a AD regulation from H ± ºH g " H h " 4 " H u ) to I, then the dynamic of I is a given as }y } ± N(H) ± NºH g " H h " 4 " H u ).

2.
Most examples illustrated in the manuscript have the form of linear combination of singlevariable monotonic functions (Fig. 1, Fig. 2a/b/d/e, and Fig. 3). It seems the authors were trying to avoid the aforementioned issues. For the cAMP oscillator (Fig. 2f), where there were interaction terms, the supplementary data indicated that GOBI detected a false regulation (ACA " cAMPe) and missed a strong prediction (ACA " cAMPi). These mishaps were omitted in the main figure and not discussed in the main text. Nevertheless, the authors claimed that they recovered the relationship network successfully, prompting necessary doubt over their implementation. Further, the part of inference with noisy data (Fig. 3) was not included in the code, making it impossible to verify or replicate.
We acknowledge that most of our examples in the manuscript contained only a linear combination of single-variable monotonic functions. However, we did not intend to avoid the inevitable issues associated with multivariable monotonicity, but have merely chosen the popular biological oscillatory models, which were considered in previous inference studies (Tyler, J., Forger, D. & Kim, J. K., Bioinformatics, 2022; Pigolotti, S., Krishna, S., & Jensen, M. H., Physical review letters, 2009). In the presence of interaction terms (i.e., multivariable monotonicity), GOBI can still infer a regulatory network. For instance, let us consider a network including three different interaction terms (Fig. R1a): the multiplication of two variables (5 -7 i ), the multiplication of a hill function and a single variable ( g f$fgko -8), and the multiplication of two different hill functions ( n gkn ' g f$fgkp ). With the ODE describing the network, we simulate 100 time-series data from different initial conditions. Next, from each time series, we compute the regulation-detection score for every 1-3D regulation (Fig. R1b   predictions via the _ test. For example, 5 . 7 . 8 . 6 and 5 . 7 . 8 3 6 are falsepositive because _ n p o (8) = 0. By merging the inferred 2D regulations, GOBI successfully infers the network structure in the presence of interaction terms.
Next, the reviewer pointed out that our supplementary data indicates that GOBI detected a false regulation (575 K5@CM) and missed a strong prediction (575 . K5@CQ). We believe the reviewer may have misinterpreted our results. First, 575 . K5@CM, which was inferred by GOBI, is not a false regulation [Maeda, M. et al., Science, 2004]. Second, we believe the reviewer thought that GOBI did not infer 575 . K5@CQ from 1D inference results, but it was inferred via 2D inference (i.e., 575 . EMO5 3 K5@CQ). Specifically, to understand the Supplementary Data 1, it is important to note that GOBI infers multi-dimensional regulations, which implies that each AD framework of GOBI only infers A dimensional regulations. As the reviewer mentioned, 575 . K5@CQ was not inferred based on the results of the 1D framework in Supplementary Data 1. However, when the 2D framework was performed, 575 . K5@CQ was inferred as a component of the 2D regulation 575 . EMO5 3 K5@CQ. Thus, 575 .
K5@CQ was included in the inferred network structure.
Nevertheless, while reporting the results, we unintentionally presented the positive regulation from 575 to K5@CM as negative in Figure 2f. Thus, we have revised the Figure 2f as follows.
Lastly, in response to the reviewer's concern regarding the code for Figure 3, we have now submitted all the codes used in our manuscript with detailed annotations (see Reviewer 2's comment 4 for details).
3. The manuscript also contains multiple cases of inconsistency and contrived arguments, severely reducing readers' confidence in the contents. For instance, on page 5, the authors mentioned that noise made it difficult for the regulation-detection score to distinguish direct and indirect regulations, thus they needed another criterion for inference, i.e., the surrogate test and combining the p-values. Occasions like this leave one to ponder how limited the method actually is.
Our framework, GOBI, comprises three main steps: regulation-detection score, delta test, and surrogate test. In order to enhance readability, we have explained our framework in a step-bystep manner throughout the current manuscript. In Fig. 1 and 2, we demonstrated that the regulation-detection score and delta test are sufficient to infer regulations from simulated timeseries data in the absence of noise. However, in the presence of noise, we introduced the 20 20 surrogate test as an additional step to handle noisy data. Thus, the sentence mentioned by the reviewer does not indicate a limitation of GOBI, but rather serves as a transition to explain the incorporation of the surrogate test into GOBI.
4. It is far from clear how GOBI would be a user-friendly tool for the wider community. The analyses on simulated datasets (Fig. 2) appear to involve ad hoc selection that requires domainspecific knowledge or personal discretion (see supplementary data). Inference using experimental data (Fig. 4) was not described in details, and there was no associated supplementary data. It is not entirely obvious how the authors arrived at their findings, other than the schematic summaries. In addition, the code was not annotated for easy adoption and the user's manual is missing.
As the reviewer mentioned, the analysis on simulated data sets involved assuming negative self-regulation as prior knowledge. However, assuming the types of self-regulations is just a useful option for users to accurately infer network structures with limited data. GOBI can successfully infer the network structures without assuming negative self-regulation. We have now illustrated this more clearly (see Reviewer 3's comment 4). Additionally, we have provided detailed explanations of the inference process using experimental data in the Supplementary Information, including results from three different steps in GOBI: total regulation score, delta test, and surrogate test ( Supplementary Fig. 9).
Nevertheless, we agree with the reviewer's comment that our codes were not annotated for easy adoption. To address this concern, we have now submitted all the codes used in our manuscript with detailed annotations. Furthermore, we have now included a user manual with a simple example to help users adopt GOBI more easily. Now, to use GOBI, users need to do the following tasks after uploading the data, which are easy to do, as seen below. 1) Users can optionally choose the thresholds for the regulation-detection region, regulationdetection score, and total regulation score, as well as the critical values for the ( test and the surrogate test. To assist users in selecting those values, we have provided guidelines (see Reviewer 2's comment 7 and Reviewer 3's comment 6 for details). Thus, users can use our guidelines as defaults for inference or make adjustments depending on whether the goal is to decrease false positive or negative predictions.
2) Users can optionally incorporate the available prior knowledge into the inference (see Reviewer 3's comment 4 for details). Of course, it is possible to run the inference without any prior knowledge, but incorporating such knowledge is beneficial.
3) Users can select the maximum dimension of the framework for inference. Typically, if the system of interest consists of A components, it is recommended to run the inference up to an A¯( dimensional framework (A dimensional framework including self-regulations). However, this recommendation is not always feasible because inferring high-dimensional regulation requires a large amount of data. To assists users in choosing an appropriate dimension of the framework, we provided guidelines to check whether the current data is sufficient for running the 21 21 inference of a specific dimension (see Reviewer 3's comment 10). Thus, users can easily select the maximum possible dimension of the framework based on the available data.
Based on this, we have now included the user manual in the Supplementary Information as follows: • Supplementary Information: Section XVIII. Manual for the GOBI computational package Here, we provide a manual for a computational package, GOBI (General ODE-Based Inference), to infer a network structure from time-series data. To help user's understanding, we use an illustrated example of the repressilator (Fig. 4c). Supplementary Fig. 14). Supplementary Fig.  14). This function interpolates the data based on the interpolation method specified by the user and cuts the data into windows using moving window technique, where the window size and overlapping ratio are defined by the users as follows.

(b) Users need to choose the sampling rate for the interpolation. The parameter 'time_interval' indicates how finely the users wants to interpolate the original time series. For example, 'time_interval = 0.5' indicates interpolation using a time interval twice as fine as the original time series. Selecting 'time_interval' to make about 100 time points per period is recommended, and please note that the low value of 'time_interval' (high sampling rate) makes the inference accurate, but slow as well.
(c) For the data segmentation, users need to specify the parameters for the moving window technique, i.e., window size and overlapping ratio. The parameter 'window_size_ori' defines the number of time points in each window. Then, along the time series, we move the window until the next window overlaps with the current window by the ratio defined in the parameter 'overlapping_ratio' ('overlapping_ratio = 0.1' as a default). For oscillatory time-series data, it is recommended to choose the window size as one period. The time series in every window is saved at the variable 'y_total'.
After interpolation and data segmentation, the data is saved in 'data_cut.mat' file. Supplementary Fig. 14). This code integrates options that can be adjusted by the users or set via our guidelines.

Update the 'Step0_options.m' file (see Options in
(a) Users need to specify the thresholds for regulation-detection region ('thres_R'), regulationdetection score ('thres_S'), and total regulation score ('thres_TRS') as well as the critical values for ( test ('p_delta = 0.01' as defaults) and surrogate test ('p_surrogate = 0.001' as defaults). To assist users in selecting those values, we have provided guidelines based on the noise level of data ( Supplementary Fig. 5). Thus, users can use our guidelines as defaults or make adjustments depending on whether the goal is to decrease false positive or negative predictions. Fig.  13).

(b) The parameter 'type_self' defines options for the types of self-regulation: no assumption ('type_self = NaN'); negative self-regulation ('type_self = -1'); no self-regulation ('type_self = 0'); and positive self-regulation ('type_self = 1'). Also, users can optionally incorporate other available prior knowledge into the inference (Supplementary Fig. 3). Of course, it is possible to run the inference without any prior knowledge, but incorporating such knowledge is beneficial when the amount of data is limited. (c) The parameter 'max_D' defines the maximum dimension of the framework for inference. Typically, if the system of interest consists of A components, it is recommended to run the inference up to an A¯( dimensional framework (A dimensional framework including selfregulations). However, this recommendation is not always feasible because inferring highdimensional regulation requires a large amount of data. To assist the users in choosing an appropriate dimension of the framework, we have provided the guidelines to check whether the current data is sufficient for running the inference of a specific dimension (Supplementary
All the options are saved in the 'data_with_options.mat' file. Also, during the inference, our framework automatically gives a warning signal when the data is insufficient to run the framework. Then, users should adjust these options. Supplementary Fig. 14).

(a) Run the 'Step1_compute_RDS_dim1.m' function. First, this function finds all the possible 1D regulations and saves them at the variable 'component_list_dim1'. Each row of component_list_dim1' indicates the set of causal variable (C) and target variable (T). For each pair (C and T), regulation-detection region and score are computed for all the regulation types (+ and 2) using time-series data, and they are saved at the variables 'R_total_list' and 'S_total_list'. Those values are saved in the 'RDS_dim1.mat' file. (b) Run the 'Step1_compute_TRS_dim1.m' function. Using the 'thres_R' and 'thres_S' that users specified, Total Regulation Score (TRS) is computed for each possible 1D regulation. As a result, the heatmap of TRS is displayed, and the exact values of TRS are saved at the variable '
TRS_total' in the 'TRS_dim1.mat' file. In this heatmap, each row indicates the possible 1D regulation (C and T) and each column indicates the regulation type (+ and 2). Using the 'thres_TRS' that users specified, 1D regulations are inferred.
(c) Run the 'Check1D.m' function. This function checks whether the data is sufficient to confidently infer 1D regulations. If the warning signal comes out, users are recommended to stop the inference or adjust the options. Supplementary Fig. 14).

Run the codes for 2D framework (see Step 2 in
(a) Run the 'Step2_compute_RDS_dim2.m' function. First, this function finds all the possible 2D regulations and saves them at the variable 'component_list_dim2'. Each row of component_list_dim2' indicates the set of two causal variables (7 g and 7 h ), and target variable (T). For each set (7 g " 7 h " G), regulation-detection region and score are computed using timeseries data for all the regulation types (61" 17" 61" 27" 62" 17, and 62" 27) and they are saved at the variables 'R_total_list' and 'S_total_list'. These values are saved in the 'RDS_dim2.mat' file.

(b) Run the 'Step2_compute_TRS_dim2.m' function. Using the 'thres_R' and 'thres_S' that users specified, Total Regulation Score (TRS) is computed for each possible 2D regulation. As a result, the heatmap of TRS is displayed, and the exact values of TRS are saved at the variable '
TRS_total' in the 'TRS_dim2.mat' file. In this heatmap, each row indicates the possible 2D regulation (7 g " 7 h " G) and each column indicates the regulation type (61" 17" 61" 27" 62" 17, and 62" 27). Using the 'thres_TRS' that users specified, candidates for 2D regulations are inferred.  Supplementary Fig 14. Sample input and output for the GOBI package based on the experimental repressilator example (Fig. 4b). The 'data.mat' file contains the time points ('t') and the time-series data for each variable ('y'). Then, the time series are interpolated using the 'spline' method and cut using the moving window technique. Next, users have options regarding hyper-parameters, types of self-regulation, and the maximum dimension of the framework. Here, we use default values for hyper-parameter, assuming negative self-regulation, and specifying 'max_D = 2'. With these options, our 1D framework computes TRS and infers 1D regulations. Also, our 2D framework computes TRS and performs ( test and surrogate test to infer 2D regulations. By merging the inferred regulations, a network structure is reconstructed. During the inference, our framework automatically gives a warning signal when the data is insufficient. 26 26 Specific Comments: 5. Fig. 1a: Notations are misplaced. Zd should be Xd, and X'd should be Y'd 6. Fig.1d-l: It is not clear what the color bar refers to.

(c) Run the 'Step2_Delta_test_dim2.m' function. For every candidate for 2D regulations (Inferred from 5-(b)), this function performs the ( test for each causal variable (C1 and C2). If the number of data is smaller than 25, then this function tests whether the signs of regulation-delta functions are non-negative or not. If the number of data is larger than 25, this function performs
We thank the reviewer for bringing to our attention the misplaced and omitted notations in Figure 1. The color bars in the figure represent the value of the regulation-detection functions. Based on this, we have revised Figure 1 as follows: 7. Fig. 2a and 3i: Why was the critical value for the Delta test 0.01 and 0.001 for the surrogate test? Were they derived from benchmarking?
The critical values for the Delta test and surrogate test were not derived from the benchmarking study. Instead, we determined the optimal critical values for inferring true network structures from noisy simulated data (Fig. 3). Then, we used these optimized values to infer causation from experimental time-series data ( Fig. 4 and 5). Note that the critical value for the surrogate test is more stringent than that of the Delta test because the surrogate test aims to distinguish between direct and indirect regulations, which the Delta test cannot do in the presence of high levels of noise ( Supplementary Fig. 6). While we recommend using these values as the default in GOBI, depending on whether the goal is to decrease false positive or negative predictions, one can adjust the threshold (i.e., choose more or less strict critical values). 27 27 8. Fig. 3c: The blue line should be 1 for low noise levels according to Fig. 3b. It is not addressed why the regulation of A -> C was not detected using TRS thres.
We thank the reviewer for pointing out the errors in Figure 3. As the reviewer mentioned, the blue line, representing the total regulation score for 5 3 6, should be one for low noise levels. Based on this, we have revised Figure 3 as follows.
Next, as pointed out by the reviewer, the positive regulation 5 . 7 did not meet the 1D criteria for TRS (GEF x ¬ y ² GEF ~ ). Thus, our 1D framework only inferred 5 3 6, which met the criteria for TRS. However, after applying the 2D framework, 2D regulation 5 . 6 . 7 is inferred, which met the 2D criteria for TRS. By merging the inferred 1D and 2D regulations, we can successfully reconstruct the network structure of IFL. This highlights the need for multidimensional inferences as 5 . 7 can be inferred with 2D inference but not 1D inference. To help the reader's understanding, we have now revised the manuscript as follows: Section II. B: "… By merging the inferred 1D and 2D regulations, the regulatory network is successfully inferred. Here, note that the regulation 5 . 7 is not detected by the 1D regulationdetection score since 7 has multiple causes. However, the 2D regulation-detection score detects 5 . 6 . 7, which contains 5 . 7. This demonstrates the need for multi-dimensional inferences, as the 1D criteria alone would not have been sufficient to fully capture the regulatory relationships in the network. Since this system has three components, we infer up to 2D regulations. If there are A components in the system, we go up to (A -1)D regulations ( Supplementary Fig. 2)." Section II. C: "In the presence of noise in the time-series data, the regulation-detection score (F x ¬ y ) is perturbed. Thus, F x ¬ y may not be one even if there is a regulation type d from H to I. For example, in the case of an Incoherent Feed-forward Loop (IFL) which contains 5 3 6 (Fig. 3a), F n o is always one in the absence of noise (Fig. 2a Step 1 blue), but not in the presence of noise (Fig. 3b blue). Thus, for noisy data, we need to relax the criteria F x ¬ y = 1 to F x ¬ y ² F ~ where 28 28 F ~ < 1 is a threshold. Because F n o gets farther away from one as the noise level increases, F ~ should also be decreased accordingly. We choose F ~ +6"$#%"<"'$''*°ºVWQZM TM\MT» with which true and false regulations can be distinguished in the majority of cases for our previous in silico examples (Fig. 3b and Supplementary Fig. 5e). For instance, F ~ (green dashed line, Fig. 3b) overall separates true regulation (Fig. 3b blue) and false regulation (Fig. 3b  red). Here, we choose 5 . 7, which has the highest score among all false positive 1D regulations (Fig. 2b red).
We found that the fraction of time-series data satisfying F x ¬ y ² F ~ , which we refer to as the Total Regulation Score (TRS) (Fig. 3c left), more clearly distinguishes the true (Fig. 3c right,blue) and false positive (Fig. 3c right, red) regulations. Thus, we use the criteria GEF x ¬ y > GEF ~ to infer the regulation. Similar to F ~ , GEF ~ also decreases as the noise level increases. Specifically, we use GEF ~ ± '$+¯'$'(°ºVWQZM TM\MT», which successfully distinguishes between the true and false regulations of IFL (Fig. 3c right) and in silico systems investigated in the previous section ( Supplementary Fig. 5f). …" 9. Fig. 4d-e dashed boxes: Why are regulations not inferred when they share a common target? Is it another limitation of the method?
We appreciate the reviewer for pointing out that 1D regulations are not inferred when they share a common target, which is an important aspect of the framework. Please note that this is not a limitation of GOBI, but rather a designed feature. Specifically, GOBI omits inferred regulations when they do not match the dimension of the framework. In the example of cofactors at the estrogen-sensitive XF) promoter (Fig. 4d), our 1D criteria GEF x ¬ y ² GEF ~ inferred 1D regulations ;857 3 P9E and GE<C( 3 P9E. These inferences indicate that P9E is negatively regulated by two components ;857 and P9E, which is a 2D regulation. However, since our 1D framework only guarantees a single cause for each target, we discarded these inferred 1D regulations. If both components are indeed effective together, i.e., 2D regulation ;857 3 GE<C( 3 P9E is present, those will be inferred from the 2D framework. However, with the 2D inference framework, the 2D regulation was not inferred from GOBI because GE<C( 3 P9E was identified as an indirect regulation. Thus, by merging results of 1D inference (;857 3 P9E or GE<C( 3 P9E if P9E is regulated by a single cause) and 2D inference (the absence of GE<C( 3 P9E regulation), we concluded that the inferred 1D regulation ;857 3 P9E is the only regulation. This is why we need to perform 2D inference rather than simply merging two 1D inferred regulations. This issue has been now more clearly described in the manuscript as follows: • Section II. D: "GOBI infers five 1D regulations (;857 3 P9E, GE<C( 3 P9E, P9E . CB?<<, GE<C( 3 CB?<<, and ;857 3 CB?<<) that satisfy the criteria GEF x ¬ y y ² GEF ~ . However, we exclude them because P9E and CB?<< have two and three causes, forming 2D and 3D regulations, respectively, although the 1D criteria assumes a single cause (Fig. 4d middle,dashed box). If all of these regulations are effective, they will be identified as 2D and 3D 29 29 regulations. Indeed, among the 11 candidates for 2D regulations, most of them include the five inferred 1D regulations. Via _ test and surrogate test, indirect regulations are identified among inferred 2D regulations ( Supplementary Fig. 9d). For example, P9E . ;857 3 CB?<< satisfies the criteria GEF x ¬ y y ² GEF ~ . Among two causal variables (i.e., P9E and ;857), only positive regulation from P9E passes the post-filtering test, i.e., only 1D regulation P9E . CB?<<, but not ;857 3 CB?<< is inferred as a direct regulation. Consequently, after excluding all the indirect regulations, two 1D regulations (P9E . CB?<< and ;857 3 P9E) and one 2D regulation CB?<< . GE<C( . ;857) are inferred." • Section II. D: "… While two positive causal links from AB h and respirable suspended particulates (EZXJY) to the disease are identified as 1D regulations (Fig. 4e middle), we exclude them because they share the same target (Fig. 4e middle, dashed box). Among two inferred 2D regulations, one passes the _ test and surrogate test (Fig. 4d middle)." 30 30

Reviewer: 3
The authors propose a new method to detect regulations and their types in a class of systems suited to model biological oscillators. The methodology is applied successfully to both synthetic and experimental data. For this reason, I think it deserves to be published. However, I believe that a revision of the manuscript is necessary.
1. There is a clash between the authors' claim of a general inference method and the details provided in the text, which focuses mainly on computational biology and regulatory networks. For example, in the introduction, model-based inference is presented only focusing on algorithms designed for regulatory networks, and popular and powerful methods such as Kalman Filter (that requires only a model, not restricting it to any class, and can handle both measurement and dynamical noise) are not even mentioned. Interaction inference is a broad field. Methodology and literature are vast and stretch well beyond the papers highlighted by the authors, which are very specific to regulatory networks. The authors should either broaden their view throughout the paper, including applications, or tune it to the designed audience and preferred realm of application (starting from the title).
We thank the reviewer for highlighting the importance of including the Kalman Filter in the discussion of model-based inference. We have now discussed this in the Introduction section as follows: • Section I: Alternatively, model-based methods infer causality by testing the reproducibility of time-series data with mechanistic models using various methods such as simulated annealing (Gotoh, Tetsuya, et al., PNAS, 2016) and the Kalman Filter (Wang, Zidong, et al.;IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009, Pirgazi, J.;Khanteymoori, A. R. PloS one, 2018). Although testing the reproducibility is computationally expensive, as long as the underlying model is accurate, the model-based inference methods are accurate, even in the presence of synchrony in time series and indirect effect [21][22][23][24][25][26][27][28][29].
We agree with the reviewer's point that our paper focused on examples from biology, covering various scales. Specifically, we investigated gene regulatory networks such as the genetic oscillator (Fig. 4b), repressilator (Fig. 4c), and cofactors at the estradiol-sensitive promoter (Fig.  4d), as well as ecological systems such as the prey-predator system (Fig. 4a) and the impact of air pollutants on cardiovascular disease (Fig. 4e). Please note that previous publications on causation detection in Nature Communications have also focused on examples from biology (e.g., Leng, Siyang, et al., Nat Commun, 2020;Yang, A.C., Peng, CK. & Huang, N.E., Nat Commun, 2018;Orhan, A., Ma, W.J., Nat Commun, 2017). Given this context, we believe that our focus on biological examples aligns with the journal's aim and scope and does not mislead its readership. 31 31 2. The part of the Discussion section regarding the method limitations is very slim. There are limitations and assumptions here and there (such as thresholds when applying to noisy series) on which the authors should spend few words in the Discussion.
We agree with the reviewer that the limitations and assumptions of our method deserve more attention. Although we mentioned these limitations and assumptions throughout the manuscript, we recognize the need to explicitly address them in the Discussion section as the reviewer suggested. We have now clearly described the two limitations and two assumptions of our methods in the discussion section: the choice of threshold when applying to noisy data (please also look at response 7 for Reviewer 2), the choice of sampling rate (please also look at response 3 for Reviewer 1), the assumption about monotonic regulation (please also look at response 6 for Reviewer 1), and the assumption about negative self-regulation (please also look at the responses 4 and 5).
• Section III: " … Despite these advantages, our method has some limitations that should be addressed. First, our framework assumes that when H causes I, H causes I either positively or negatively. Thus, GOBI cannot capture the regulation when H causes I both positively and negatively or when the type of regulation changes over time. However, GOBI can be potentially extended to detect temporal-structured models, including non-monotonic regulation ( Supplementary Fig. 11). It would be interesting in future work to investigate the extended framework thoroughly under diverse circumferences. Additionally, while we have considered the general form of monotonic ODE (Eq. (1)), GOBI can also be extended to describe interactions including time delays ( Supplementary Fig. 12). This will be an interesting future direction to make GOBI more broadly applicable. Also, another limitation is the possibility of false-positive predictions since our method tests the reproducibility of time-series data using necessary conditions. To resolve this, we use multiple time-series data and perform post-filtering tests (i.e., _ test and surrogate test). Nonetheless, it should be noted that inferring high-dimensional regulations requires a large amount of data ( Supplementary Fig. 13). To address this challenge, we can use prior knowledge about the system. For example, in biological systems, negative self-regulation can be assumed, as the degradation rates of molecules increase as their concentrations increase. By assuming negative self-regulation, we are able to reduce the AD regulation to ºA¯(»D regulation, which allows us to successfully infer the network structure even with a small amount of experimental data (Fig. 4c). Note that when a priori assumption (e.g., the types of selfregulation) is not met, only the links that violate the assumptions are not trustable, i.e., the other inference results are not affected (Supplementary Fig. 3).
To use GOBI, we need to choose hyper-parameters. When applying GOBI to noisy data, users must choose thresholds for the regulation-detection region, regulation-detection function, and total-regulation score as well as two critical values of significance (i.e., X-values for _ test and surrogate test). In this study, we determine these values by using noisy simulated data of various examples (Fig. 3 and Supplementary Fig. 5). Nevertheless, these values are effective when they are applied to experimental time-series data (Fig. 4 and 5). Thus, we have set those values of hyper-parameters as the default values of GOBI. However, the optimal threshold may vary depending on the data characteristics, and users may need to adjust the thresholds based on the importance of avoiding false-positive or false-negative predictions. Another hyperparameter that requires consideration is the choice of sampling rate. In this study, we used a sampling rate of 100 points per period after evaluating the trade-off between computational cost and accuracy. However, users can decrease or increase the sampling rate if the computation speed is too slow or if a higher level of accuracy is required, respectively.
3. The paper has several statements that are just too generous towards the method, such as "our approach completely resolves the fundamental limit of model-based inference: strong dependence on a chosen model." The presented method can only be applied to a specific class of systems, so it sits somewhere between model-based inference and model-free inference. Furthermore, in the noisy case, it depends on two thresholds. Tuning down the text, eliminating these kinds of statements, and highlighting the limitation of the methodology does not take anything out of its value.
We completely agree with the reviewer's concern that some of our statements were too generous in the current manuscript. In response, we have toned down the language to be more cautious in our claims as shown below. Also, we have expanded the Discussion section to include a more detailed explanation of the limitations of our approach (see Reviewer 3's comment 2).
• Abstract: "… Here, we address this limitation by deriving an easily testable condition for a general monotonic ODE model to reproduce time-series data. We built a user-friendly computational package, GOBI (General ODE-Based Inference), which is applicable to nearly any monotonic system with positive and negative regulations described by ODE. …" • Section I: "Here, we develop a model-based method that infers interactions among multiple components described by the general monotonic ODE model: where N can be any smooth and monotonic increasing or decreasing functions of H and H u is I in the presence of self-regulation. Thus, our approach considerably resolves the fundamental limit of model-based inference: strong dependence on a chosen model." • Section III: "We develop an inference method that considerably resolves the weakness of model-free and model-based inference methods. We derive the conditions for interactions satisfying the general monotonic ODE (Eq. (1)). As this allows us to easily check the reproducibility of given time-series data with the general monotonic ODE (i.e., the existence of ODE satisfying given time-series data) without fitting, the computational cost is dramatically reduced compared to the previous model-based approaches. Importantly, as our method can be applied to any system described by general monotonic ODE (Eq. (1)), it significantly addresses 33 33 the fundamental limit of the model-based approach (i.e., requirement of a priori model accurately describing the system) ( Supplementary Fig. 4). …" 4. "In most biological systems, the degradation rates of molecules increase as their own concentrations increase; thus we assume that self-regulation is negative for every component in the system". This is another assumption that was not mentioned in presenting the method. If the method is good, this assumption is not needed. On the contrary, it would be an output of the inference. The authors should show that.
GOBI can infer regulations without the assumption of negative self-regulation as it can also detect whether the self-regulation is positive or negative. Thus, the assumption for negative selfregulation is optional and can be used to reduce the computational cost and data requirements. Similarly, if we have any prior knowledge of the part of regulatory networks, we can assume specific regulation types for the subnetworks based on the prior knowledge. To aid the reader's understanding, we have now discussed this in the Supplementary Information as follows: • Supplementary Information: Section VI. Incorporating prior knowledge into the inference from GOBI When using GOBI for inference, users have the option to incorporate prior knowledge into the analysis. This involves assuming the types of regulation for specific parts of the network based on any available prior knowledge they have, which allows effective inference with a limited amount of data. In particular, in biological systems, the degradation rates of molecules typically increase as their own concentrations increase, so users can often assume negative selfregulation, as described in our manuscript. Of course, wrong assumptions may result in some incorrect inference results. To illustrate this, we use the example of the Frzilator (Fig. 2c) to infer the network structure with and without assuming the types of self-regulation.
Using the Frzilator ODE model, we first simulate one, five, and ten time series from different initial conditions which lie in the range of the original limit cycle ). From the time-series data, GOBI successfully infers the true network structure regardless of the amount of data when we assume the negative self-regulation ( Supplementary Fig. 3b, the first row). However, without assuming self-regulation ( Supplementary Fig. 3b, the second row), the inference results differ depending on the amount of data. Specifically, when ten time-series data are used, GOBI successfully infers the network structure. However, as the amount of data used decreases, the inferred network has some incorrect predictions ( Supplementary Fig. 3b, red arrows in the second row). Taken together, assuming the types of self-regulation is not necessary, but it can be a useful option when the amount of available data is limited.
However, making wrong assumptions about the types of self-regulation can lead to some incorrect inference results. To investigate this, we apply GOBI by falsely assuming a positive self-regulation of N and correctly assuming negative self-regulations of K and M ( Supplementary  Fig. 3b, the third row). While GOBI successfully infers N . K and K . M, GOBI fails to infer regulation M 3 N$ This indicates that GOBI can incorrectly infer the regulation targeting the component whose self-regulation is incorrectly assumed. Supplementary Fig. 3. Inference of network structures under different assumptions of selfregulation types. a The Frzilator ODE model (Fig. 2b) is used to simulate one (a (i)), five (a (ii)), and ten (a (iii)) time series from different initial conditions. b Using these time-series data, the inference results from GOBI with three different approaches are illustrated: assuming negative self-regulations (b the first row), not assuming the types of self-regulation (b the second row), and assuming the wrong types of self-regulation (b the third row), where dashed lines indicate the assumed self-regulation types. In the last approach, the positive self-regulation of N is assumed, which is incorrect. Assuming negative self-regulation allows GOBI to accurately infer the network structure regardless of the amount of data (b the first row). However, not assuming the types of self-regulation leads to false-positive predictions (red arrows) when the amount of data is limited (b the second row). Under wrong assumptions of self-regulation types, GOBI only infers regulations targeting components whose self-regulations are correctly assumed (b the third row).
Also, we have now revised our manuscript as follows: Section II. B: "… We apply the framework to infer regulatory networks from simulated timeseries data of various biological models. In these models, the degradation rates of molecules increase as their concentrations increase, like in most biological systems (i.e., self-regulation is negative). Such prior information, including the types of self-regulation, can be incorporated into our framework. For example, to incorporate negative self-regulation, when detecting AD regulation, one can use the (A ® ()D regulation-detection function and score that include negative self-regulation. Specifically, when inferring 1D positive regulation from H to I, the criteria F x y y = 1 is used. To illustrate this, we assume the negative self-regulation to infer the network structures of biological models (see below for details). Note that this assumption is optional for inference (see Supplementary Information for details). … Here, assuming negative self-regulation allows us to reduce AD regulation to (A-1)D regulation ( Supplementary Fig. 3). This simplification is important for accurate inference when data is limited. Moreover, it should be noted that when the assumptions about the types of self-regulation are not met, only the links that violate these assumptions become untrustworthy, while the other inference results are not affected (Supplementary Fig. 3

). …"
Previously, when applying GOBI to the synthetic genetic oscillator example, we assumed negative self-regulation (Fig. 4b). However, as experimental data is sufficient for this case, we no longer need to make this assumption. Thus, we now apply GOBI without assuming negative self-regulation as follows: Section II. D: "… Next, we apply GOBI to the time series of the synthetic genetic oscillator, which consists of Tetracycline repressor (GM[E) and RNA polymerase sigma factor (d hj ) [43] (Fig. 4b left). While the time series are measured under different conditions after adding purified GM[E or inactivating intrinsic GM[E, our method consistently infers the negative feedback loop including negative self-regulations based on two direct regulations d hj . GM[E and GM[E 3 d hj for all cases (Fig. 4b middle and Supplementary Fig. 9b). This indicates that our method can infer regulations even when the data are achieved from different conditions since we do not specify the specific equations with parameters in Eq. (1).
We next investigate the time-series data from a slightly more complex synthetic oscillator, the three-gene repressilator [44] (Fig. 4c left). As the amount of data is greatly reduced compared to the synthetic genetic oscillator (Fig. 4b), we assume self-negative regulation. … Furthermore, compared to the synthetic genetic oscillator (Fig. 4b), the amount of data is small and the number of components is large; thus, it is essential to assume negative self-regulation for correct inference, i.e., without the assumption, the available data is insufficient to fill the space of the regulation-detection function, making it difficult to detect 2D regulations. " • Supplementary Information: Section XII: "… Similarly, for the genetic oscillator, the criteria of TRS for 1D regulation infers two direct regulations, GM[E 3 d hj and d hj . GM[E (Supplementary Fig.9b (i) We thank the reviewer for this comment. When the assumptions about the types of selfregulations are not met, only the links violating the assumptions are not trustable. We have now discussed this in the Supplementary Information (Supplementary Fig. 3 and see Reviewer 3's comment 4 for details).
6. If I well understand, the thresholds for the network inference are derived a posteriori to optimize the inference performance. If so, this should be discussed as a limitation, at least in the Discussion. Which is the sensitivity of the results to the threshold values? And how should the user proceed to choose the thresholds if ground-truth data is not available?
We do not need the ground-truth to choose the thresholds. Specifically, to run inference on noisy time-series data using GOBI, users need to choose three different thresholds for regulation-detection region, regulation-detection score, and total-regulation score. Thus, we suggested guidelines for selecting appropriate thresholds based on the noise level and the dimension of the regulation using simulated time-series data obtained from various ODE systems ). Specifically, our guidelines involve decreasing the thresholds for regulation-detection score and total regulation score as the noise level increases. Note that the noise level can be quantifiable with sole time series (see Methods and Supplementary Fig. 8). Additionally, our guidelines involve decreasing the threshold for regulation-detection region as the dimension of regulation increases (see Supplementary Information for details). As we used these default guidelines of GOBI for the inference from experimental data (Fig. 4), we did not use the experimental data to derive posteriori thresholds for optimizing the inference performance. Specifically, we estimate the noise level in experimental data sets, determine the dimension of regulation that we want to infer, and then apply our guidelines to choose appropriate thresholds accordingly. Indeed, the guidelines derived from simulated data were effective for the experimental data sets as well (Fig. 4 and Supplementary Fig. 9). Also, to test the robustness of our guidelines, we have illustrated the sensitivity of the results to changes in threshold values ( Supplementary Fig. 5). Thus, by using the guidelines we have suggested, users can select appropriate threshold values even in the 37 absence of ground-truth data. Of course, users can also adjust the thresholds depending on whether the goal is decreasing false-positive or false-negative predictions. Nevertheless, we agree with the reviewer's comment regarding the limitations of our proposed thresholds and have explicitly addressed them in the Discussion section (see Reviewer 3's comment 2). Furthermore, we have provided an explicit explanation for our choice of thresholds in the Results section: • Section II D (page 6): "When the proposed thresholds for the regulation-detection score (Fig. 3b) and Total Regulation Score (Fig. 3c) and two critical values of significance (i.e., X-value = 0.01 for the ( test and X-value = 0.001 for the surrogate test) are used, GOBI successfully infers the regulatory networks from in silico time series. Here, we use GOBI with these default hyperparameters to infer regulatory networks from experimentally measured time series. … " Some additional minor points: 7. Sec II A: "positive or negative causation" maybe the authors meant "regulation" rather than "causation" In this study, we defined causation as direct regulation, which means that if H causes I, then the dynamic of I is given as I 5 ± NºH», where N is a monotonic function with respect to H. Consequently, in our manuscript, we used both 'causation' and 'regulation' to refer to this concept, which could be confusing for readers. To address this issue, we have now replaced all instances of 'causation' with 'regulation' in the Results section to clarify our terminology. 8#").,"(("'&"*/."+87/456"6/481-"67+7.",1.+51;"/49"="06".:7.3-.-"74"7/5.."+3-"245." dimensions. The reader can get the idea but being explicit is always better.
We agree with the reviewer's comment regarding the need for a clear explanation of how the _ test is extended to three or more dimensions. In our current manuscript, we have only provided a general definition of the _ test for higher dimensions without further elaboration. To improve the reader's understanding of the _ test for high dimensions, we have now provided explicit explanations including a simple example as follows: I are false-positive predictions, respectively." 9. Sec IIC: The noise in the time series is measurement noise, not dynamical, isn't it? It should be stated clearly in the text. Also, it would be interesting to see, at least for a simple negative feedback loop of two nodes, what would be the impact of dynamical noise.
As the reviewer mentioned, in our current manuscript, the noise in the time series is measurement noise and this needs to be explicitly stated in the manuscript. To address this concern, we have revised the manuscript as follows: Section IV. B: "…To introduce measurement noise in time series, we introduce multiplicative noise sampled randomly from a normal distribution with mean 0 and standard deviation given by the noise level. For example, for 10% multiplicative noise, we add the noise Hº[ » -c to Hº[ ), where c&Aº'" '$( h ). …" Also, to make GOBI more widely applicable, it is crucial to consider the influence of dynamical noise. Thus, we have now illustrated the impact of dynamical noise on the performance of GOBI ( Supplementary Fig. 7). Our findings indicate that GOBI exhibits a comparable level of robustness to dynamical noise as it does to measurement noise (see Reviewer 1's comment 2).
10. How much data is needed to run the inference? How does it depend on the order of the interaction? Is it suitable for large networks? I understand that the space to compute S goes down exponentially but I think the paper needs more quantitative statements We appreciate the reviewer's comment. As the amount of data increases, the confidence in the inference also increases. Furthermore, not only the quantity but also the quality of the data is important. In particular, we found that using multiple short time-series data from different initial conditions can lead to more accurate results compared to using a single long-time series (Fig. 4b). As both the quantity and quality of the time-series data are essential for accurate inference, it is difficult to provide a simple rule for determining how much data is needed to run the inference. However, one guideline we can suggest to the user is to check how much the domain of the regulation-detection function is filled with the data. As this space becomes more filled, GOBI can provide more confident results. We have now revised the computational package, GOBI, to show how much the domain of the regulation detection is filled (see Reviewer 2's comment 4 and Supplementary Fig. 14 Fig. 13b (iii), c (iv), and d (v)). Thus, in this example, twice as much data is required as the dimension increases by one. In case the available data is not sufficient, i.e., when the domain is not adequately filled, our framework will issue a warning that more data is required for confident result.
Supplementary Fig. 13 12. In the supplementary material, the description of the noise level calculation could be improved. Just saying that the library uses a MATLAB functon(of which I couldn't find the documentation) without mentioning what the function does is a bit short, being that an ingredient of the calculation. Same for the MATLAB function "gradient". Is it applied before or after the noise filtering? I guess after, but it should be specified. And what is the underlying algorithm?
We agree with the reviewer that our current manuscript does not fully describe the use of the MATLAB functions. In response to the reviewer's concern, we have now revised the manuscript as follows: • Section IV. A: "Here, we describe the key steps of our computational package, GOBI (Github link will be provided upon acceptance • Supplementary Information (Section XII): " Supplementary Fig. 8. Approximate noise level of experimental data using residual. a For each in silico system, we compute the mean square of the residual between noisy and fitted time series when the noise level varies. Here, fitted time series are obtained by using the MATLAB function 'fit' with an option 'fourier4'. The mean square of the residual is averaged over all components in the system and we simply called it 'residual'. The value of the residual increases as the noise level increases and their tendency is similar among the systems. b Using this tendency, we can approximate the noise level of experimental data. For each system that we used (prey-predator, genetic oscillator, repressilator, estradiol data set and air pollutants and cardiovascular disease), experimental time-series data are interpolated using the MATLAB function 'fit' with an option 'fourier4'. Then, we compute the residuals and approximate their noise level." 13. In the calculation of GC, I couldn't find how you chose the order of the AR processes for Y and X.
In the calculation of GC, we unintentionally omitted the order of the AR processes. We have now revised the Methods section to include the order of the AR processes (see Reviewer 3's comment 14).
14. Speaking of GC, if you shift one series and test GC X(t) ! X(t+T) (Fig.4ab) and GC does not detect that as a link, I would be surprised, as the shifted timeseries becomes an AR process of its past, which is exactly what GC tests for. Maybe the authors could discuss more in depth why they chose to employ this test. Alternatively, I think that if they want a null hypothesis for nonconnection, it could have more sense to use a series from another dataset (such as TetR & P & D).
We thank the reviewer for this comment. We performed the GC using the code provided in [Chandler, 2020.], and specified the maximum order of AR process that produced the first minimum of delayed mutual information. For oscillatory time series, the delayed mutual information typically exhibits its first minimum at a quarter of the period. In our examples , each time series of the prey-predator system and genetic oscillator was duplicated and shifted about half of its period. This indicates that the shifted time series does not become an AR process of its past. Thus, it is possible that GC does not detect the link from the original time series to the shifted one.

42
Nevertheless, we agree with the reviewer's point that the examples of the shifted system were not suitable for testing the model-free methods. When the order of AR process exceeds half of the period, the shifted time series becomes an AR process of its past, and the GC automatically identifies the link from the original time series to the shifted one. Also, shifting the time series does not significantly alter its shadow manifold; thus, PCM and CCM are likely to identify the link between the original and shifted time series. Taking into account these concerns, we have revised our approach by using a new example to compare the performance of GOBI and modelfree methods.
As recommended by the reviewer, we have merged the time series of two different data sets instead of shifting them. Specifically, from the set of eight different time-series data of genetic oscillator measured under different conditions, we selected one that has a similar phase to the time series of the prey-predator system. Then, the selected time series was merged with the time series of the prey-predator system. Using this merged system, we tested GOBI and modelfree methods. We found that both PCM and GOBI successfully inferred the true network structure, whereas CCM and GC did not. Furthermore, when we reduced the sampling rate by half, the accuracy of PCM dramatically dropped, whereas GOBI was still able to infer the true network structure (see Reviewer 1's comment 3). This finding indicates that model-free methods often misidentify synchrony for causality.
Based on this, we have revised the manuscript as follows: Section II. E: "For the prey-predator system and the genetic oscillator ( Fig. 4a-b), we merge them to create a more challenging case (Fig. 5a). Specifically, from the set of eight different time-series data of a genetic oscillator measured under different conditions, we select one that has a similar phase to the time series of the prey-predator system (Fig. 4b panel at the 2 nd row and 2 nd column). Then, we merge the selected time series with the time series of the preypredator system. While GOBI and PCM successfully detect two independent feedback loops (Fig. 5a), CCM and GC infer false positive predictions (e.g., C to d hj in Fig. 5a) because they usually misidentify synchrony as causality. Furthermore, when we reduce the sampling rate by half, the accuracy of PCM dramatically drops, whereas GOBI can still infer the true network structure ( Supplementary Fig. 10)."